Skip to content

updates for 25.12#997

Merged
eordentlich merged 9 commits intoNVIDIA:mainfrom
eordentlich:eo_25.12_nightly
Dec 16, 2025
Merged

updates for 25.12#997
eordentlich merged 9 commits intoNVIDIA:mainfrom
eordentlich:eo_25.12_nightly

Conversation

@eordentlich
Copy link
Collaborator

@eordentlich eordentlich commented Dec 9, 2025

mainly attempts to align with some breaking cuml changes, including

  • logistic regression
    • training objective no longer computed in cuml (for consistency).
    • single label values not trained in logistic regression.
    • lbfgs params now need to be passed to contrustor.
  • some cuml model fields need to be set differently for some algos, at inference time.
  • kmeansmg is back
  • pylibraft and dask and related no longer default dependencies of cuml
  • scipy minimum is now 1.11 which raised some issues in databricks 13.3
  • Also fixes some dbscan tests that can fail in multi-gpu settings.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
…pat tests

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 9, 2025

Greptile Overview

Greptile Summary

This PR updates the entire spark-rapids-ml codebase to align with cuML 25.12 nightly builds, addressing multiple breaking changes in the upstream RAPIDS ecosystem. The changes include comprehensive version updates across all components (Python package, JVM artifacts, Docker images, documentation) from 25.10.0 to 25.12.0, along with infrastructure updates to handle cuML's new dependency model.

Key technical changes include: adapting LogisticRegression to pass LBFGS parameters to the constructor instead of post-initialization and removing automatic objective computation; migrating KMeans from the deprecated class to KMeansMG; adding explicit installation of previously bundled dependencies (pylibraft, raft-dask) across all deployment scripts; updating model field initialization patterns for PCA, UMAP, and RandomForest to work with cuML's new internal structure; and enhancing test reliability by fixing DBSCAN tests that could fail in multi-GPU environments.

The PR maintains API compatibility for end users while adapting to significant internal changes in cuML 25.12, ensuring the spark-rapids-ml library continues to provide GPU-accelerated drop-in replacements for Spark ML algorithms.

Important Files Changed

Filename Score Overview
python/src/spark_rapids_ml/classification.py 4/5 Major LogisticRegression updates for cuML 25.12: constructor parameter changes, single-label exception handling, removed objective computation
python/src/spark_rapids_ml/clustering.py 4/5 Migrated KMeans from deprecated class to KMeansMG, removed multigpu parameter, updated model construction patterns
python/src/spark_rapids_ml/regression.py 4/5 Updated LinearRegression defaults, reorganized imports, fixed model field initialization for n_features_in_ and n_cols
python/src/spark_rapids_ml/umap.py 4/5 Updated UMAP model construction with new attribute names, parameter handling, and embedding array wrapping for cuML 25.12
python/tests/test_logistic_regression.py 4/5 Comprehensive test updates: new objective utility import, parameter validation changes, constructor updates, error handling changes
notebooks/databricks/init-pip-cuda-12.sh 4/5 CUDA upgrade to 12.2.2, RAPIDS 25.12.0 update, explicit dependency installation, numpy pinning for Databricks compatibility
python/src/spark_rapids_ml/feature.py 5/5 Fixed PCA dtype access, corrected typo, updated n_features_in_ initialization for cuML 25.12 compatibility
python/src/spark_rapids_ml/metrics/utils.py 5/5 New utility module implementing logistic regression objective calculation since cuML no longer computes it automatically
python/src/spark_rapids_ml/tree.py 5/5 Fixed RandomForest model reconstruction by switching from CuPy to NumPy arrays for classes_ attribute initialization
python/tests/test_dbscan.py 5/5 Improved multi-GPU test reliability by adding explicit ID fields and sorting operations for deterministic results
python/src/spark_rapids_ml/connect_plugin.py 5/5 Removed objective attribute from LogisticRegression serialization since cuML 25.12 no longer computes it
docker/Dockerfile.pip 5/5 Updated CUDA to12.2.2, RAPIDS to 25.12.0, added explicit pylibraft and raft-dask dependencies
docker/Dockerfile.python 5/5 CUDA upgrade, variable rename to RAPIDS_VERSION, explicit RAPIDS component installation
ci/Dockerfile 5/5 CI environment update: CUDA 12.2.2, RAPIDS 25.12, explicit dependency installation
python/pyproject.toml 5/5 Simple version bump from 25.10.0 to 25.12.0 in package metadata
python/src/spark_rapids_ml/__init__.py 5/5 Package version update from 25.10.0 to 25.12.0
jvm/pom.xml 5/5 Maven artifact version update from 25.10.0 to 25.12.0
docs/source/conf.py 5/5 Sphinx documentation version update to 25.12.0

Confidence score: 4/5

  • This PR requires careful review due to extensive breaking changes across multiple algorithms and infrastructure components
  • Score reflects the comprehensive nature of changes affecting LogisticRegression parameter handling, KMeans algorithm migration, dependency management, and model field initialization patterns that could impact runtime behavior
  • Pay close attention to classification.py, clustering.py, regression.py, and umap.py which contain complex logic changes affecting model training and inference

Sequence Diagram

sequenceDiagram
    participant Developer
    participant "Docker Build" as Docker
    participant "cuML 25.12" as cuML
    participant "Logistic Regression" as LogReg
    participant "Test Suite" as Tests
    participant "Documentation" as Docs

    Developer->>Docker: Update RAPIDS_VERSION to 25.12
    Docker->>Docker: Install cuml=25.12 cuvs=25.12 pylibraft=25.12
    Docker->>Docker: Update scipy minimum to 1.11
    
    Developer->>LogReg: Handle cuML breaking changes
    LogReg->>LogReg: Remove training objective computation
    LogReg->>LogReg: Handle single label value cases
    LogReg->>LogReg: Update lbfgs parameter handling
    LogReg->>cuML: Set model fields for inference
    
    Developer->>Tests: Fix DBSCAN multi-GPU issues
    Tests->>Tests: Update test configurations
    Tests->>Tests: Handle sparse data optimization
    
    Developer->>Docs: Update version references
    Docs->>Docs: Update notebook examples
    Docs->>Docs: Refresh API documentation
    
    Developer->>Tests: Run full test suite
    Tests->>cuML: Validate cuML 25.12 compatibility
    Tests->>LogReg: Verify logistic regression fixes
    Tests->>Tests: Confirm DBSCAN stability
    
    Tests-->>Developer: All tests passing
    Developer->>Developer: Finalize PR for 25.12 release
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. python/benchmark/databricks/gpu_cluster_spec.sh, line 58 (link)

    logic: The init script init-pip-cuda-13.0-nightly.sh does not exist in the repository. This will cause cluster creation to fail. The available script is init-pip-cuda-12.0.sh.

  2. python/benchmark/databricks/gpu_etl_cluster_spec.sh, line 69 (link)

    logic: The init script init-pip-cuda-12.0-nightly.sh does not exist in the repository. This will cause cluster creation to fail. The available script is init-pip-cuda-12.0.sh.

  3. python/README.md, line 25 (link)

    syntax: The $RAPIDS_VERSION shell variable is used but not defined in this context. This command won't work as documented.

32 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@eordentlich
Copy link
Collaborator Author

build

…bricks scripts

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

…ckerfile

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. notebooks/databricks/init-pip-cuda-12.sh, line 46 (link)

    logic: numpy~=1.0 constraint is too broad and will install numpy 1.x which could include very old incompatible versions. Should specify a more restrictive minimum version. What specific numpy version range is required for compatibility with RAPIDS 25.12 and Databricks 13.3?

  2. python/README.md

    logic: The $RAPIDS_VERSION variable is referenced but not defined in the conda command. Users will get an error unless this variable is set.

    Should this use a specific version number like the other packages, or should there be instructions to define RAPIDS_VERSION first?

35 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. python/README.md, line 25 (link)

    style: The $RAPIDS_VERSION variable is used but not defined in this documentation - users may be confused about what value to substitute

    Should the documentation include instructions on how to set or determine the appropriate RAPIDS_VERSION value?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. notebooks/databricks/init-pip-cuda-12.sh, line 46 (link)

    style: numpy version constraint ~=1.0 is extremely permissive and may not address the scipy 1.11 compatibility issue mentioned in the PR description. Should this be a more specific numpy version constraint to ensure compatibility with scipy 1.11 requirements?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. python/src/spark_rapids_ml/metrics/utils.py, line 56 (link)

    style: Use is True comparison is unnecessary - if lr_model.getStandardization(): would be more pythonic

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  4. python/src/spark_rapids_ml/regression.py, line 806 (link)

    style: Setting n_cols twice appears redundant - line 801 and 806 both set the same attribute

    Is the duplicate assignment on line 806 intentional for compatibility reasons, or should one of these be removed?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

36 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@eordentlich eordentlich changed the title updates for 25.12 nightly updates for 25.12 Dec 12, 2025
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. python/benchmark/databricks/cpu_cluster_spec.sh, line 27 (link)

    style: The spot_bid_price_percent parameter is now unused since availability is ON_DEMAND

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. notebooks/databricks/init-pip-cuda-12.sh, line 45 (link)

    logic: numpy version constraint ~=1.0 seems overly permissive and may allow incompatible versions. Should this be a more restrictive constraint like numpy>=1.21,<2.0 to ensure compatibility with the RAPIDS ecosystem?

  3. python/src/spark_rapids_ml/regression.py, line 809 (link)

    style: Redundant assignment of lr.n_cols since it was already set on line 804

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  4. python/src/spark_rapids_ml/classification.py, line 1074-1091 (link)

    style: Exception handling properly catches cuML's new single-label restriction, but the string matching approach using traceback.format_exc() is fragile and could break with cuML error message changes

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

36 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@rishic3 rishic3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@eordentlich eordentlich merged commit ddfa00b into NVIDIA:main Dec 16, 2025
4 checks passed
@eordentlich eordentlich deleted the eo_25.12_nightly branch December 16, 2025 21:03
@YanxuanLiu YanxuanLiu mentioned this pull request Dec 17, 2025
YanxuanLiu added a commit that referenced this pull request Dec 17, 2025
default CUDA version has been updated in
#997

Update image tag to use latest docker image to run CI.

Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants